Building a Collation Element Table for a Large Chinese Character Set in YES

نویسندگان

  • Xiaoheng Zhang
  • Xiaotong Li
چکیده

YES is a simplified stroke-based method for sorting Chinese characters. It is free from stroke counting and grouping, and thus much faster and more accurate than the traditional method. This paper presents a collation element table built in YES for a large joint Chinese character set covering (a) all 20,902 characters of Unicode CJK Unified Ideographs, (b) all 11,408 characters in the Complete List of Chinese Characters Used by the Media in 2013, (c) all 13,000 plus characters in the latest versions of Xinhua Dictionary(v11) and Contemporary Chinese Dictionary(v6). Of the 20,902 Chinese characters in Unicode, 97.23% have one-to-one relationship with their stroke order codes in YES, comparing with 90.69% of the traditional method. Enhanced with the secondary and tertiary sorting levels of stroke layout and Unicode value, there is a guarantee of one-to-one relationship between the characters and collation elements. The collation element table has been successfully applied to sorting CC-CEDICT, a Chinese-English dictionary of over 112,000 word entries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Groups whose Bipartite Divisor Graph for Character Degrees Has Five Vertices

Let $G$ be a finite group and $cd^*(G)$ be the set of nonlinear irreducible character degrees of  $G$. Suppose that $rho(G)$ denotes the set of primes dividing some element of $cd^*(G)$. The bipartite divisor graph for the set of character degrees which is denoted by $B(G)$, is a bipartite graph whose vertices are the disjoint union of $rho(G)$ and $cd^*(G)$, and a vertex $p in rho(G)$ is conne...

متن کامل

Groups whose set of vanishing elements is exactly a conjugacy class

‎Let $G$ be a finite group‎. ‎We say that an element $g$ in $G$ is a vanishing element if there exists some irreducible character $chi$ of $G$ such that $chi(g)=0$‎. ‎In this paper‎, ‎we classify groups whose set of vanishing elements is exactly a conjugacy class‎.

متن کامل

基於對照表以及語言模型之簡繁字體轉換 (Chinese Characters Conversion System based on Lookup Table and Language Model) [In Chinese]

The character sets used in China and Taiwan are both Chinese, but they are divided into simplified and traditional Chinese characters. There are large amount of information exchange between China and Taiwan through books and Internet. To provide readers a convenient reading environment, the character conversion between simplified and traditional Chinese is necessary. The conversion between simp...

متن کامل

Basic Elements Knowledge Acquisition Study in the Chinese Character Intelligent Formation System

In the Chinese character intelligent formation system without Chinese character library, it is possible that the same basic element in different Chinese characters is different in position, size and shape. The geometry transformation from basic elements to the components of Chinese characters can be realized by affine transformation, the transformation knowledge acquisition is the premise of Ch...

متن کامل

Torsion Analysis of High-Rise Buildings using Quadrilateral Panel Elements with Drilling D.O.F.s

Generally, the finite element method is a powerful procedure for analysis of tall buildings. Yet, it should be noted that there are some problems in the application of many finite elements to the analysis of tall building structures. The presence of artificial flexure and parasitic shear effects in many lower order plane stress and membrane elements, cause the numerical procedure to converge in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015